58 research outputs found
Automated Protein Structure Classification: A Survey
Classification of proteins based on their structure provides a valuable
resource for studying protein structure, function and evolutionary
relationships. With the rapidly increasing number of known protein structures,
manual and semi-automatic classification is becoming ever more difficult and
prohibitively slow. Therefore, there is a growing need for automated, accurate
and efficient classification methods to generate classification databases or
increase the speed and accuracy of semi-automatic techniques. Recognizing this
need, several automated classification methods have been developed. In this
survey, we overview recent developments in this area. We classify different
methods based on their characteristics and compare their methodology, accuracy
and efficiency. We then present a few open problems and explain future
directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront
Event Prediction using Case-Based Reasoning over Knowledge Graphs
Applying link prediction (LP) methods over knowledge graphs (KG) for tasks
such as causal event prediction presents an exciting opportunity. However,
typical LP models are ill-suited for this task as they are incapable of
performing inductive link prediction for new, unseen event entities and they
require retraining as knowledge is added or changed in the underlying KG. We
introduce a case-based reasoning model, EvCBR, to predict properties about new
consequent events based on similar cause-effect events present in the KG. EvCBR
uses statistical measures to identify similar events and performs path-based
predictions, requiring no training step. To generalize our methods beyond the
domain of event prediction, we frame our task as a 2-hop LP task, where the
first hop is a causal relation connecting a cause event to a new effect event
and the second hop is a property about the new event which we wish to predict.
The effectiveness of our method is demonstrated using a novel dataset of
newsworthy events with causal relations curated from Wikidata, where EvCBR
outperforms baselines including translational-distance-based, GNN-based, and
rule-based LP models.Comment: published at WWW '23: Proceedings of the ACM Web Conference 2023.
Code base: https://github.com/solashirai/WWW-EvCB
Improving Neural Ranking Models with Traditional IR Methods
Neural ranking methods based on large transformer models have recently gained
significant attention in the information retrieval community, and have been
adopted by major commercial solutions. Nevertheless, they are computationally
expensive to create, and require a great deal of labeled data for specialized
corpora. In this paper, we explore a low resource alternative which is a
bag-of-embedding model for document retrieval and find that it is competitive
with large transformer models fine tuned on information retrieval tasks. Our
results show that a simple combination of TF-IDF, a traditional keyword
matching method, with a shallow embedding model provides a low cost path to
compete well with the performance of complex neural ranking models on 3
datasets. Furthermore, adding TF-IDF measures improves the performance of
large-scale fine tuned models on these tasks.Comment: Short paper, 4 page
A Cross-Domain Evaluation of Approaches for Causal Knowledge Extraction
Causal knowledge extraction is the task of extracting relevant causes and
effects from text by detecting the causal relation. Although this task is
important for language understanding and knowledge discovery, recent works in
this domain have largely focused on binary classification of a text segment as
causal or non-causal. In this regard, we perform a thorough analysis of three
sequence tagging models for causal knowledge extraction and compare it with a
span based approach to causality extraction. Our experiments show that
embeddings from pre-trained language models (e.g. BERT) provide a significant
performance boost on this task compared to previous state-of-the-art models
with complex architectures. We observe that span based models perform better
than simple sequence tagging models based on BERT across all 4 data sets from
diverse domains with different types of cause-effect phrases
OM-2017: Proceedings of the Twelfth International Workshop on Ontology Matching
shvaiko2017aInternational audienceOntology matching is a key interoperability enabler for the semantic web, as well as auseful tactic in some classical data integration tasks dealing with the semantic heterogeneityproblem. It takes ontologies as input and determines as output an alignment,that is, a set of correspondences between the semantically related entities of those ontologies.These correspondences can be used for various tasks, such as ontology merging,data translation, query answering or navigation on the web of data. Thus, matchingontologies enables the knowledge and data expressed with the matched ontologies tointeroperate
Ontology Matching: OM-2018: Proceedings of the ISWC Workshop
International audienceno abstrac
LakeBench: Benchmarks for Data Discovery over Data Lakes
Within enterprises, there is a growing need to intelligently navigate data
lakes, specifically focusing on data discovery. Of particular importance to
enterprises is the ability to find related tables in data repositories. These
tables can be unionable, joinable, or subsets of each other. There is a dearth
of benchmarks for these tasks in the public domain, with related work targeting
private datasets. In LakeBench, we develop multiple benchmarks for these tasks
by using the tables that are drawn from a diverse set of data sources such as
government data from CKAN, Socrata, and the European Central Bank. We compare
the performance of 4 publicly available tabular foundational models on these
tasks. None of the existing models had been trained on the data discovery tasks
that we developed for this benchmark; not surprisingly, their performance shows
significant room for improvement. The results suggest that the establishment of
such benchmarks may be useful to the community to build tabular models usable
for data discovery in data lakes
Proceedings of the 15th ISWC workshop on Ontology Matching (OM 2020)
15th International Workshop on Ontology Matching co-located with the 19th International Semantic Web Conference (ISWC 2020)International audienc
Proceedings of The Tenth International Workshop on Ontology Matching (OM-2015)
shvaiko2016aInternational audienceno abstrac
- …